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Abstract. In statistical mechanics, the Potts model is a model for interacting 
spins with more than two discrete states. Neural networks which exhibit features 
of learning and associative memory can also be modeled by a system of Potts spins. A 
spontaneous behavior of hopping from one discrete attractor state to another (referred 
to as latching) has been proposed to be associated with higher cognitive functions. 
Here we propose a model in which both the stochastic dynamics of Potts models 
and an adaptive potential function are present. A latching dynamics is observed in 
a limited region of the noise (temperature)-adaptation parameter space. We hence 
suggest noise as a fundamental factor in such alternations alongside adaptation. From 
a dynamical systems point of view, the noise-adaptation alternations may be the 
underlying mechanism for multi-stability in attractor based models. An optimality 
criterion for realistic models is finally inferred. 

Keywords: Computational neuroscience, Network dynamics, Classical Monte Carlo 
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1. Introduction 

Among statistical approaches to modeling neural networks, the Ising model, beside other 
binary models, has received a lot of attention as a maximum entropy pairwise model. An 
instance of such binary models is a Boltzmann machine which is a Monte Carlo version 
of the Hopfield network. The Potts model [1] is essentially the generalization of the Ising 
model to more than two state network units and, like the Ising model, it first caught 
attention for its richness in physical applications [ ] . Kanter was among the first who 
generalized the application of the Ising model in neural networks with features of learning 
and associative memory [3, 4] to Potts model [5]. Some recent efforts have been dedicated 
to estimating the storage capacity of Potts model for associative memory [6, 7, 8, 9]. 
Ising models constructed based on recorded data from cultured cortical neurons have 
proven successful in providing a good description of the real data [10]. Although the 
quality and limitations of this model concerning pairwise correlations in larger networks 
are still under investigation [11], the Ising and Potts models are potentially capable 
of incorporating higher order correlations. Recently, these models with specific energy 
functions are found useful at many levels of image processing, including segmentation 
of an image into its constituent regions and multi-scale analysis of image data [12]. 

In their 2002 article, Hauser et al suggested that a computational mechanism for 
recursion, which provides a capacity to generate an infinite range of expressions from a 
finite set of elements, is the only uniquely human component of the faculty of language 
[13]. This argument, beside considerations about the local and global circuitry of the 
neocortex, is the basis of Treves and Roudi's proposal for a Potts model with a hopping 
behavior among global network states, given the discrete nature of these attractor states. 
"The trajectory . . . will essentially include periods close to attracting states . . . and rapid 
transitions between them. The system latches between attractors", as these authors 
describe it [14, 15]. The dynamics of their model comprises sets of differential equations 
that determine the activation and adaptation behavior of network units [ ]. Other 
reports have studied the structure of latching transitions [16, 17] as well as the issue of 
storing correlated patterns in such networks [18]. 

Interestingly, the latching problem in memory-based analyses bears a likeness 
to multi-stability problems, such as perceptual bi-stability: a phenomenon in which 
perception alternates between two distinct interpretations of an ambiguous stimulus. 
Moreno-Bote et al challenge in their study the mainstream models that ascribe 
alternations between dominance of two or more competing neural populations to some 
form of slow adaptation acting on the dominant population, that leads to a switch in 
dominance to the competing population (oscillator models). They propose noise as the 
main cause of alternations in their noise-driven attractor models and construct a neurally 
plausible and experimentally consistent attractor model [19]. There is a parallelism 
between the stochastic nature of dynamics in our model and noise in attractor models, 
as both models predict that alternations would cease in the hypothetical absence of 
noise: by {eliminating noise/approaching zero temperature} the system would {settle 
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down/freeze} in one of the {two percepts/several stored patterns} and stay there 
indefinitely. 

The model we present here is an alternative to the published approach by Treves 
[14], with the major distinction of enjoying a stochastic dynamics traditionally present 
in physical Ising and Potts models. In fact, we have used the Markov chain Monte Carlo 
algorithm for a network with the Gibbs probability measure. Additionally, thanks to 
an adaptive potential function the network maintains the adaptive quality of neuronal 
activity. The combination of these features results in a latching behavior, driven by 
both noise and adaptation with corresponding adjustable parameters-temperature and 
adaptation time constant, respectively. The latching we observe here is consistently 
qualified as a temporary retrieval of one stored pattern, followed by subsequent 
abandonment of that pattern and retrieval of another pattern. 

In theory, given the two parameters of temperature (noise) and adaptation, it is not 
evident at all how the latching behavior would be observed in different regions of the 
parameter space. A key finding here (from simulations) is that this hopping behavior is 
limited to a particular region of adaptation versus noise, beyond which the system either 
locks in a specific attractor state, or disorderedly fluctuates over various configurations 
without any pattern retrieval at all. Even within the very area where latching behavior 
is observed, a privileged critical temperature (T c = 1) inferred from statistical analysis 
suggests another preference, allowing us to distinguish an optimal region of activity. A 
comparison of the latching "quality" at such an optimal point with other sample points 
will also confirm our expectation of an optimal region. 

The emergence of a sharply distinct region of activity is by and large nontrivial, and 
a theoretical description of various network states in terms of analytic solution to the 
dynamics equations in our stochastic multi-state (Potts) network might be a difficult 
task. Instead, we will endeavor in our current report to identify and demonstrate various 
network states using simulations of networks with various scales and characteristic 
parameters. We will establish the robustness of the observed latching region in networks 
of various size scales in terms of various order parameters; examine the effect of 
simulation run time; corroborate the independence of the results from initial conditions 
and cue patterns; study the interplay of noise and adaptation in the near-optimal region; 
and propose an optimality criterion and identify its region. 

2. Overview of model 

A Potts network is a collection of M interacting units, each of which may be in one of 
multiple discrete states. It is actually a generalization of the Ising Model with units 
having more than two possible states. A unit may represent a single neuron or a neural 
population, having multiple states of activity (action potential, firing rate, etc.) modeled 
as multiple Potts states. 
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In the model presented here, each unit may be in one of S + 1 possible states! 
se{0,...,S} 

consisting of 1 "null" state (s — 0), and S "genuine" states (s — 1, . . . , S). § 
2.1. Interaction of units 

The following energy function is defined for the network) |: 



I M 



8=1 



, where 



s 

^ = EE << J ij"^"~.i ( 2 ) 

ifr k,i=o 

describes the energy associated with unit i being in an arbitrary state s, and Sj denotes 
the current state of the jth unit. u s k is defined based on the following modification of 
the Kronecker's delta function is defined: 

u sk = (S+ l)5 sh - 1 (3) 

which serves comparing two selected states of activity, s and k. It assumes a value of 
S if s = k, and —1 otherwise, thus the total summation over k = 0, . . . , S adds up to 
zero. (Examine the case of S = 1 - the Ising model.) 

There is also a weights matrix, wfj-, defined in section 2.2 which determines the 
relative preference of units i and j being in states k and I, respectively. 



2.2. Learning rule 

A number of p patterns are stored in the network with the weights matrix defined as 
follows: 

1 P 

W % = (S + l)iMp ^ U ^ kU ^ l(1 ~ 4o)(1 ~ 6lo) (4) 
/i=i 

in which £f represents the state of unit i in pattern \x. Notice that a weight of zero is 
associated with null states. 

Substituting (4) in (2) shows that if a unit takes up a state which is defined in a 
stored pattern, the energy associated with that unit will be locally maximized. We will 
use this feature in section 5 to implement a higher rate of occurrence for our stored 
patterns via an appropriate distribution function. 

% A more generalized (and realistic) condition is an inhomogeneous network in which S might be 
different among units. We will not deal with such conditions here. 
§ Terminlogy borrowed from [14]. 

|| For convenience, we omit the negative sign common in physical notations. 
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2.3. Dynamics 

To define a stochastic, while adaptive, dynamics for the system, we set the common 
Boltzmann rate, * (j3 > 0), for the occurrence of state s in unit i, and adaptively 
manipulate the "attractiveness" of a local attractor by virtually altering the energy 
function, hf, based on the recent activity of each unit-state. 

To accomplish this using a Monte-Carlo method of simulation, we randomly select a 
unit i (which is in state Sj) in each iteration of the program, then choose a random state 
r as a candidate for transition from Si to r. The transition occurs with the following 
probability (the Metropolis algorithm): 

P{Si -»• r) = < (5) 
I exp [(3 (hi — h*)] otherwise 

where 

hi ■= h\ - h T t 

represents an adapting potential, with h\ coming from (2) and h T \ being some adapting 
threshold with the following dynamics: 

Th T i = u s . k - h T i k = l,...,S 

H" t =0. <6) 

Notice that there is no adaptation mechanism for null states. 

The inverse of parameters r (adaptation time constant) and /3 (inverse temperature) 
represent the levels of noise and adaptation in the system respectively. 



3. Simulation and analysis 

Networks of various scales (M = 100, 300, 600 and 900) with S — 10 were simulated 
over a domain of noise-adaptation combinations. Throughout this study, the value 
S — 10 is used, unless otherwise stated. A number of p ~ patterns were stored in 
each network. Patterns were generated following the method described in [1 1] which is 
capable of producing non- to highly-correlated patterns with desired levels of complexity 
and common units. In each pattern, a fraction of a = 0.5 units were set to be in 
genuine states, with others being in null state. For the following studies, the correlation 
determinant factor (£) was set to zero to produce uncorrelated patterns. For more 
details see the supplementary material. 



3.1. Overlaps behavior 

A primary quantity of interest, O^, is the pattern retrieval reflected in the overlap 
(similarity) of the current state of the network with the stored pattern \i. It is simply 
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measured by counting the number of common genuine unit-states between the current 
configuration of the network and each stored pattern, and then normalizing the result: 

1 M 

The resulting variations of overlaps, O^, over time are depicted in figure 1 for three 
different pairs of (3 and r selections. With proper selection of noise and adaptation 
(t -1 ) parameters, a latching behavior is observed in overlaps diagrams as the system 
hops from one retrieved pattern to another (figure 1, middle.) Other types of behavior 
were also identified, in which the system is either underactive and frozen in a single 
pattern (figure 1, top,) or overactive with no pattern retrieval (figure 1, bottom.) 



3.2. Fluctuations landscape 

To investigate the overall behavior of the network for each possible combination of noise 
and adaptation parameters, the averages of overlaps variations 

o-o 2 = -£(< <V >t~<0, > t 2 ) 

were measured over a wide grid of noise-adaptation sample points, where < ... > t 
denotes averaging over a sufficiently long period of time at each point. The result is 
depicted in figure 2 (top) for a network of size M = 300 units, with S = 10 for each 
unit. 

Fluctuations of the total energy E (see equation (1)), another order parameter, 
were also measured over the same grid points using the variance 

a E 2 =< E 2 > t -< E > t 2 . 

The result is plotted in figure 2 (bottom.) 

As shown in figure 2 the confined region of maximum fluctuations is the region 
that latching behavior occurs. Looking at sample points A,H and D studied in figure 1 
confirms our expectation that the lower left section of the plot is in fact a frozen region 
of activity if considered in a sufficiently short period of time compared to the adaptation 
time constant (see section 4). The rest of the landscape belongs to an overactive or dead 
region of pattern retrieval. At all three points C, D, and E, the behavior of overlaps 
diagram is similar, at least in appearance (see figures 1 and 4, the graph for point C 
looks similar hence not shown for brevity). In this region, the system is too active 
in terms of unit-state fluctuations (^ea parameter, section 4) for any patterns to be 
retrieved, which means ironically dead in terms of pattern retrieval. 

To reveal more details about the behavior of the network with various combinations 
of noise-adaptation parameters, several other sample points were labeled in figure 2 and 
their overlaps graphs were sketched. The points were chosen to be cases with minimal 
noise, figures 3, or very slow adaptation, figure 4. 
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Figure 1: Overlaps between the network status and the ten stored patterns in a network 
of 300 units change over time. Notice that a pattern (fi = 0) is used as an initial cue in 
each run of the program. P = (x, y) in each title means /3 = 10 _a: and r = 10~ y . You 
can find the corresponding labels in figure 2. 



4. Size scaling, run time and initial conditions 

The overall behavior of the system is invariant with respect to various network sizes: 
Several sections of figure 2 were selected and replotted for different network sizes, 
M = 100, 300, 900. Some of these sections are depicted in figure 5. The corresponding 
regions of activity evidently match in different size scales. 

At this stage of the study, a third order parameter besides o~o an d o~e was also 
examined, which provided a better understanding of the observed regions of activity. 



Optimal Latching in a Potts Model 



8 



CO 




-4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0.5 1 

noise 

(-log/3) 



Figure 2: Overlaps (top) and energy (bottom) fluctuations suggest a limited region 
of latching activity within the domain of noise (— log/3) and adaptation (— logr) 
parameters. 
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Figure 3: Overlaps between the network status and the ten stored patterns in a network 
of 300 units change over time. These graphs show three cases with a common, relatively 
low noise value. Adaptation, though, is different, decreasing from the top panel to the 
bottom. P = (x,y) in each title means = 10~ x and r = 10~ y . You can find the 
corresponding labels in figure 2. 



The Edwards- Anderson order parameter defined as 

?EA = MS{ 1 + 1) E ^*)* 2 ( 8 ) 

is also plotted in figures 5c and 5f for various sizes of the network in sections passing 
through different regions. The figures reveal that in the region of high do and high o"e, 
the parameter varies gradually from its maximum to minimum value. To see what 
q-EA measures, notice that J2k ( u sik) t 2 m equation (8) is the average state of a unit i in 
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Figure 4: Overlaps between the network status and the ten stored patterns in a network 
of 300 units change over time. These graphs show three cases with a common, relatively 
slow adaptation. Noise, however, is different, decreasing from the top panel to the 
bottom. At these points no latching behavior is observed. P = (x, y) in each title 
means (3 = 10~ x and r = 10~ y . You can find the corresponding labels in figure 2. 



time, which takes the value S 2 + S if the unit is in a fixed state Sj, and vanishes if the 
unit is randomly and uniformly fluctuating between all states (cf the definition of u s k 
in (3)). Averaging this for all units and normalizing such that the maximum value is 1 
yields equation (8). This quantity is hence a better indicator of overall network activity 
as it clearly distinguishes between active and silent network states. Therefore, it is the 
high value of q-^A more than the low value of o"o that indicates the frozen region. 

Figures 5c and 5f suggest that the shape and extent of the regions are robustly 
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Figure 5: Several order parameters in various sections of the noise-adaptation landscape 
(inset, see figure 2) are examined for different network sizes. Panels (a), (c), and (e) are 
vertical sections with a fixed value of (3 (varying adaptation). Panels (b), (d), and (f) 
are horizontal sections with a fixed adaptation (varying noise). A perfect consistency is 
observed. 
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preserved under size scaling in the limited-sized networks that were studied. However, 
changing the simulation run time has a totally different effect on the extent of some 
regions. Figure 6 shows a selection of the same landscape as figure 2, which is obtained 
through a much longer run time at each point. All the figures so far were obtained with 
a run time of 300 "steps," with each step here being M single iterations of the program. 
Figure 6, however, is the result of 5000 steps at each point. In the left panels, the data 
from the initial 1500 steps of the simulation was ignored. The right panels where created 
using the full 5000 steps. Not much difference is observed. The landscape view of ^ea 
was also plotted this time. 

A noticeable difference between the short and the long simulation runs is observed. 
In the long runs, the region in which overlap fluctuations is observed extends more 
downwards, towards slower adaptations, or longer adaptation time constant. In other 
words, as we decrease the adaptation, the effect of small adaptation is still significant 
in the total overlap and energy fluctuations, because, although the retrieved patterns 
last for longer times, they finally switch to other patterns (non-optimal latching). The 
value of Qea also increases more gradually in the long runs. 

Consequently, one can argue that in reality there are no distinct phases or phase 
transitions if we look at the system in a sufficiently long time window, just a dynamics 
that slows down as the adaptation slows down. However, the maximum actual specific 
time scale of a real neural system sets an upper limit to the adaptation time constant, 
above which the system may be "effectively" frozen, or overactive, depending on the 
noise value. Moreover, an increase occurring in all the order parameters begins at around 
— log r = —1.7, which is independent of the run time and network size. This also may be 
considered as a phase change phenomena. In fact, at around this point the dynamics is 
extremely sensitive to r variations. With slow enough adaptation the system will have 
enough time to fully retrieve patterns. However, the lower limit for adaptation time 
constant (upper limit for adaptation speed) is, once again, systematically determined. 
If r gets too small, the life time of retrieved patterns tends towards a time "step." This 
means the adaptation is so fast that some parts of a pattern de-adapt before the whole 
pattern is retrieved, thus giving no time for the attractor state to rise. This "step" is 
an intrinsic property of real systems. 

To ensure that our results are independent of various initial conditions and cue 
patterns, a section of figure 2 was reexamined using a run period of 9000 steps, with 
various random initial conditions and cue patterns. We also threw away the data from 
the first 3000 steps of the total 9000 steps. Four different cue patterns with two random 
initial conditions for each pattern were tried at each point (8 trials). The resulting 
standard deviations are shown in figures 7 and 8 with error bars. The inset graphs show 
the corresponding section of study in figure 2. 

The behavior of the error bars in the lower panel of figure 7 seems very interesting. 
In our effort to understand the large variations in the error bars, especially the sudden 
change from —3.7 to —3.85, we simulated again and examined our data for energies and 
overlaps at these points. The usual behavior of the system at these points is shown in 
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Figure 6: The noise-adaptation landscape for 5000 steps. In the left panels, some initial 
portion of the data is ignored. Compare with figure 2 where the run time is 300 steps. 
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Figure 7: Reliability check: deviations from mean values over various initial conditions 
and various cue patterns are shown with error bars. The run period is 9000 steps 
compared to 300 in figures 5 and 2 (the inset). Also, the initial 3000 steps of each run is 
ignored. In these sections, the peak in the inset graph is extended more towards slower 
adaptations because in the longer runs, the non-optimal latching will still be observed 
with slower adaptations. 




Figure 8: Continued from figure 7. 



figures 9 and 10, top and middle panels, for randomly selected trials (initial conditions). 
The apparent behavior of the graphs does not show much of a difference. However, 
we noticed that the huge error bars are the result of few occurrences of a behavior 
that appeared in some trials, like in figures 9 and 10, bottom. It appears that the 
system virtually "nulls out" sometimes. By the definition of overlaps, equation (7), the 
null states are excluded from overlap calculation. So, the overlaps should vanish if all 
the units are in null states. To understand the behavior of the energy graph, notice 
that equations (1) to (4) tell us that the energies assigned to the null states are zero. 
However, the —1 values of u s k that appear in the sum in equation (2) make a nominal 
contribution to the total energy, making it slightly off zero. The system gets out of this 
"resting state" merely due to de-adaptation of other states and noise. The null-out did 
not occur in our trials at — logr < 7.85 in figure 7, hence minimal error bars. It occurs 
more frequently, and lasts for shorter periods of time as — log r gets larger, hence the 
decreasing error bars to the right. 

Another interesting feature of figure 7 is the difference between the energy and 
overlaps peaks, or rather bumps. While the rise in both graphs begins at about the 
same point on the right extreme of the panels for the reason that was explained before, 
the o"o graph drops a bit later than the oe graph on the left. It is also much smoother 
than the <te graph. This behavior can again be understood by referring to figures 9 and 
10. In these figures we see several transitions between patterns with very close energies. 
Such transition make up a significant portion of <To, while in terms of energy, they mean 
little fluctuations. 
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Figure 9: Overlaps behavior over time at points where the error bars in figure 7 change 
suddenly. Typical behaviors are shown in the top and middle panels. In the bottom 
panel, a null-out effect is observed. 



5. Behavior at around = 1 

Two different horizontal sections of figure 6 (left panels) were selected for more detailed 
study in the region where noise has a considerable effect (figure 11). The section at 
— log r = —4.65 is where adaptation is relatively slow, and the section at — log r = —2.25 
is where both noise and adaptation play a critical role in the behavior of the system 
(at around point H in figure 6). More specific parameters are explained in the figure 
captions. 

To understand the behavior of the graphs in the right panels of figure 11, we 
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Energy vs Time 
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Figure 10: Energy behavior over time at points where the error bars in figure 7 change 
suddenly. Typical behaviors are shown in the top and middle panels. In the bottom 
panel, a null-out effect is observed. 



choose to explain the overlap behavior in panel lib as — log/5 decreases. At around 
— log/3 = 0.1 the noise is so strong that it does not allow any patterns to show up 
(figure 12, top). A phase transition at around — 1 is a characteristic of an Ising 
models. A classical two-dimensional g-state Potts model also exhibits phase transition 
when exp(/3) — 1 = ^/q [. 0]. The phase transition beginning at around = 1 is not 
surprising. At around — log/3 = —0.05 some jittering begins to show up (figure 12, 
middle). Notice that we are still close to the high-noise border, and the adaptation 
is slow but not zero, so it facilitates transitions induced by noise. This results in the 
first peak at around 0. At around —0.2 the temperature is low enough for the retrieved 
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Figure 11: Two horizontal sections of figure 6 (insets). In the left panels, the run time 
is 9000 steps, with the first 3000 steps ignored. In the right panels, the run time is 5000 
with the first 1500 steps ignored. All the inset graphs are from figure 6 (left panels) with 
5000 steps run and 1500 ignored. The right panels in the above figure completely match 
the inset graphs. The error bars here are calculated in the same fashion as figures 7 and 
8, i.e. 8 trials with different initial conditions/cue patterns. 
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patterns to stabilize (figure 12, bottom). Thus the overlap fluctuations decrease again 
at this point. However, a glance at the qea graph in panel llf reveals that although a 
number of about aM(= 0.5 • 300) units are fixed in the primary retrieved pattern, the 
rest of them are still fluctuating freely between various states. This can be seen better 
when the rest of the system is attracted to secondary patterns that are partially retrieved 
as shown in figure 13, top. This pattern retrieval needs a slightly lower temperature 
to occur, and the partially retrieved patterns are, like in the first peak, jittery and 
transient. This results in a smaller peak at around —0.35. As we further decrease the 
temperature, both partial and full pattern retrievals get solid and stable (figure 13, 
middle), resulting in low ctq and erg values again. Here, we notice a "life-shortening" 
effect of noise on pattern retrievals. It can be observed in all of our data (including 
those not presented herein) that increasing noise alone results in a higher probability of 
pattern transitions, hence shorter retrieval lifetimes. 

Now, we turn our attention to the other selected section, where — logr = —2.25 
(figure 11, left panels). Adaptation is faster in this section, so with decreasing 
temperature the first patterns show up in a lower temperature. The ascents in the 
(To an d Qea graphs look quite simple. The oe graph, however, shows an interesting peak 
immediately after the rise. Recalling that close-energy pattern transitions can account 
for overlaps activity with little energy fluctuations, we conclude that this peak signifies 
the most diverse pattern activity in terms of energy fluctuations. An instance of overlaps 
activity at (—0.25, —2.34) is shown in figure 13, bottom. If we look back at figure 3 we 
can see what happens if we further decrease the noise. We see that although pattern 
fluctuations may be fast due to fast adaptation, several patterns may rise at a time, that 
is, secondary and higher order pattern retrievals are observed in low temperatures. This 
"purifying" effect of noise was also observed in our study of the section — logr = —4.65. 
Here, our results confirm that for a pure, distinct pattern retrieval we need /3 close 
enough to 1. 

We are now ready to articulate our optimality criterion and specify its region. 
We define a utility function such that optimal latching corresponds to maximal utility 
function. Among various possiblities, we take our utility function U(f3,r,T) to be the 
number of transitions between uniquely retrieved patterns over a given rum time T. 
A pattern \x is "uniquely" retrieved when for some high and low thresholds Th and 
T L G [0, 1], we have M > T H and O u < T L for all v ^ \i. A "transition" occurs when 
a uniquely retrieved pattern is replaced by another. We calculate U by counting the 
number of transitions. 

The above utility function immediately excludes the dead/overactive region as not 
optimal. It also demands for the fastest latching dynamics. By "fast" we mean the life 
span of retrieved patterns and the transition time between retrievals are short. This 
requires that adaptation be maximal. The uniqueness condition for retrievals makes the 
noise maximal, too. With proper selection of Th and Tl, and a fixed T, the optimal 
region should get confined to around point H, or the bottom panel in figure 13. 

Other optimality criteria are also possible to suggest. One can simply take cte as 
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Figure 12: Overlaps behavior at some select points in figure 11 right panels. The top 
panel shows a dead/overactive dynamics. As — log/3 decreases, pattern retrieval begins 
(middle) and the retrieved patterns solidify as we further decrease the noise (bottom). 



the utility function since it has an absolute maximum at around (—0.1, —3) (cf figure 6). 
The overlaps and energy behavior at this point are plotted in figure 14. Interestingly, 
we see that the retrieval periods are short, and the system spends considerable time 
in overactive/dead state (not null-out, compare with figures 9 and 10, bottom) during 
transitions where the average energy is almost zero. 

As yet another option, one may look for the most "diverse" transitions as being 
optimal. By diverse, we mean having maximum randomness in terms of maximum 



Optimal Latching in a Potts Model 21 



Overlaps 
At (-0.35,-4.65) 




1 

At (-0.25,-2.25) 




t 



Figure 13: Overlaps behavior at some select points in figure 11. The top panel shows 
a secondary pattern retrieval that occurs when noise is low enough. The primary and 
secondary retrieved patterns solidify as we further decrease the noise (middle). In the 
bottom panel, the interplay between noise and adaptation is high, and the latching 
behavior is close to optimal. 

entropy rate (assuming a stationary distribution): 

where p^ is the retrieval rate of pattern /x, and P^ is the transition matrix. The search 
for this region shall be done in future studies. 
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Figure 14: Overlaps (top) and energy (bottom) behavior when <7e is at maximum. 
Overactive periods separate the retrievals. 



6. Discussion 

In this work, we have constructed a model combining two major characteristics from 
apparently separate disciplines. Our model possesses two major components: a 
temperature parameter and an adaptation one. The former is a primary constituent 
of a thermodynamical and statistical-physics framework, while the latter represents a 
major quality of real neural networks and plays an important role in the dynamics of 
realistic models suggested to date for the study of numerous phenomena in the brain. 
Figure 2 reveals how these two basic components are joined to form a novel perspective 
- a latching behavior confined to a limited region of the parameter space. 

A construction of Ising networks based on data from real retinal neurons suggests a 
preferred working temperature at around /3 = 1 [21]. A phase transition at this point is 
also observed in our model. The other basic parameter in our model, adaptation time- 
constant, also plays a key role in determining the type of network activity. The region 
where latching behavior occurs is limited in terms of noise and adaptation. However, 
more specific optimal criteria can be suggested to limit the desired area. The joint 
analysis of the two basic components, temperature and adaptation, singles out a critical 
region of optimal activity at around point H in figure 2. A comparison of the latching 
behavior at a sample point in this zone, such as H (figure 1, or 13 bottm), with several 
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other points of latching possibility, such as /, B or J (figure 3), reveals how indeed 
the optimal region is privileged: the retrieval sequence at H exhibits fast and pure 
emergence of distinct patterns with regular periods, in contrast to co-occurring retrievals 
and indistinct, irregular transitions at other sample points. The findings here suggest 
that in the realistic models that incorporate adaptation mechanism, the respective time 
constants and the amount of noise might need to be limited to permitted ranges that 
comply well with the overall functionality of the network. 

A rich variety of dynamical states are observed in different regions of the phase 
diagram. From a grammatical point of view, a traditional latching behavior occurs when 
a retrieval is cued by its previous retrieval, like in figure 3. However, with a sufficient 
presence of noise in the system, the network tends toward a spontaneous activity in which 
pattern retrievals are more or less cued by noise. This is most noticeable in figure 14. In 
cases like point H, figure 1 middle, the transitions are highly noise driven, though the 
chain is not totally memory-less given the exponential recovery of adapted unit-states. 
Hence a deeper understanding of the boundaries and grammatical characteristics of 
these two types of behavior is definitely needed in future works. 

Moreover, there are two types of dynamical states observed so far that can separate 
retrieval chains: overactive states (figure 14) and null-outs (figure 10 bottom). The 
former is typical of the overactive/dead region where unit-states fluctuate too rapidly 
to form patterns. The latter occurs when noise is low enough for units to settle in null 
states, when the system is "tired" of recently retrieved patterns. This is, however, not 
a favorable state compared to pattern energy levels, hence it is a temporary state even 
though the null states do not adapt. Further work is required to verify these speculations 
and determine the rate and lifetime of such states. 

Another interesting dynamical state is the 'hierarchical' pattern retrieval 
exemplified in figures 13 top, and 3. In this sort of dynamics, "one state is retrieved, 
serving as a framework for other states to be partially retrieved one after the other in the 
meantime," as described by a referee for this article. This as well seems very promising 
in terms of grammatical significance. Though further analysis falls out of the context 
of this article and remains for future studies. 

As shown in section 5, noise has a shortening effect on retrieval lifetime, or, an 
increasing effect on the rate of transitions. In fact, noise is an essential constituent of 
the dynamics and unit-state transitions stop shortly as (3 — > oo (cf equation (5)). This 
accords well with the recent models [19] in which alternations in dominant patterns 
of neural activity is induced by noise, while adaptation would not lead to alternations 
in the absence of noise. What is important in this scenario is that instead of an ad 
hoc assumption about the presence of noise, it is the interplay between adaptation 
and noise which sets the timescale of alternations. The fact is that the transition 
probabilities between different attractor states need not be at the scale of biophysical 
noise source characterized by fast timescales. This, indeed, would be too unrealistic 
given that the latching state of the network is meant to support transition states 
corresponding to highest cognitive states. In terms of the state-space and energy 
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landscape, the noise-adaptation interplay will shift the boundary line between basin 
of attractors as well as reducing the depth of the minimum associated with dominant 
patterns [19]. Given the optimal region in the noise-adaptation state space for maximum 
rate of transition probabilities there is room for realistic rate of alternations by varying 
noise and adaptation rates in the appropriate domain. In a similar vain, Kumar et 
al [ l] have emphasized the rate of noise in shifting the dynamics in favor of spiking 
activity propagation in neural networks. The idea of a feed-forward network embedded 
in a recurrent network and hence the possibility of alternating patterns of activity in 
the form of a packet of synchronous neural activity bears a close resemblance to the 
hopping behavior of different attractor states in the Potts model. It will be interesting 
to see how the noise-adaptation interplay may play a similar role in controlling different 
activity modes in such embedded feed-forward networks. 

The "Potts" virtue of this model, which lies in the multiplicity of states of each 
unit, plays a dramatic role in determining the shape and extent of latching region(s). 
The parameter S was kept to be 10 throughout this study. However, the effect of its 
alteration remains to be a target of future studies. Moreover, a thorough analysis of 
transition structure in the retrieval sequence is required to illuminate the potentials of 
the network for grammatical association and sequence generation. Any such analysis 
shall be preferably performed around the optimal region where the retrievals are unique, 
with high signal-to-noise quality, and frequent enough. 

Acknowledgments 

The authors would like to thank Yasser Roudi for his insightful comments and critical 
assessment, and Mohammad Reza Razvan for helpful suggestions at the early stage of 
this work. We also appreciate the critical comments and suggestions by the referees 
for this article, which spurred deeper analyses and new findings. The computation was 
carried out at Math. Computing Center of IPM (http://math.ipni.ac.ir/iiicc). 



Optimal Latching in a Potts Model 



25 



[1] R. B. Potts. Some generalized order-disorder transformations. Mathematical Proceedings of the 

Cambridge Philosophical Society, 48:106-109, 1952. 
[2] F. Y. Wu. The potts model. Rev. Mod. Phys., 54(l):235-268, 1 1982. 

[3] Daniel J. Amit, Hanoch Gutfreund, and H. Sompolinsky. Storing infinite numbers of patterns in 

a spin-glass model of neural networks. Phys. Rev. Lett., 55(14):1530-1533, 12 1985. 
[4] Daniel J. Amit, Hanoch Gutfreund, and H. Sompolinsky. Spin-glass models of neural networks. 

Phys. Rev. A, 32(2):1007-1018, 8 1985. 
[5] Ido Kanter. Potts-glass models of neural networks. Phys. Rev. A, 37(7):2739-2742, 4 1988. 
[6] Daxing Xiong and Hong Zhao. Estimates of storage capacity in the q-state potts-glass neural 

network. J. Phys. A: Math. Theor., 43(44):445001, 10 2010. 
[7] Emilio Kropff and Alessandro Treves. The storage capacity of potts models for semantic memory 

retrieval. J. Stat. Mech., 43:08010, 08 2005. 
[8] Matthias Lowe and Franck Vermet. The capacity of q-state potts neural networks with parallel 

retrieval dynamics. Statistics & Probability Letters, 77:1505-1514, 08 2007. 
[9] D. Bolle, R. Cools, P. Dupont, and J. Huyghebaert. Mean-field theory for the q-state potts-glass 

neural network with biased patterns. J. Phys. A: Math. Gen., 26(3):549, 02 1993. 
[10] Elad Schneidman, Michael J. Berry, Ronen Segev, and William Bialek. Weak pairwise correlations 

imply strongly correlated network states in a neural population. Nature, 440(7087):1007-1012, 

4 2006. 

[11] Yasser Roudi, Joanna Tyrcha, and John Hertz. Ising model for neural data: Model quality and 

approximate methods for extracting functional connectivity. Phys. Rev. E, 79(5):051915, 5 2009. 
[12] David Mumford and Agnes Desolneux. Pattern Theory. A K Peters, Ltd., 8 2010. 
[13] Marc D. Hauser, Noam Chomsky, and W. Tecumseh Fitch. The Faculty of Language: What Is 

It, Who Has It, and How Did It Evolve? Science, 298(5598):1569-1579, 2002. 
[14] Alessandro Treves. Frontal latching networks: a possible neural basis for infinite recursion. 

Cognitive Neuropsychology, 22:276-291, 2005. 
[15] Alessandro Treves and Yasser Roudi. Of the evolution of the brain. 2004. http : //people . 

sissa.it/~ale/TrevesRoudi.pdf, visited on 11/02/2010. 
[16] Emilio Kropff and Alessandro Treves. The complexity of latching transitions in large scale cortical 

networks. Nat. Comput., 6(2):169-185, 2007. 
[17] Elconora Russo, Vijay M K Namboodiri, Alessandro Treves, and Emilio Kropff. Free association 

transitions in models of cortical latching dynamics. New J. Phys., 10:015008, 1 2008. 
[18] Emilio Kropff and Alessandro Treves. Uninformative memories will prevail: the storage of 

correlated representations and its consequences. HFSP J., 1:249 262, 11 2007. 
[19] Ruben Moreno-Bote, John Rinzel, and Nava Rubin. Noise-Induced Alternations in an Attractor 

Network Model of Perceptual Bistability. J Neurophysiol, 98(3):1125-1139, 2007. 
[20] R J Baxter. Potts model at the critical temperature. Journal of Physics C: Solid State Physics, 

6(23):L445, 1973. 

[21] Gasper Tkacik, Elad Schneidman, Michael J. Berry II, and William Bialek. Ising models for 

networks of real neurons, 2008. arXiv:q-bio/0611072vl [q-bio.NC]. 
[22] Arvind Kumar, Stefan Rotter, and Ad Aertsen. Spiking activity propagation in neuronal networks: 

reconciling different perspectives on neural coding. Nat Rev Neurosci, ll(9):615-627, 9 2010. 



